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Abstract 

Over the past decade, high performance computational (HPC) clusters 



> 

^ have become mainstream in academic and industrial settings as accessible 

, means of computation. Throughout their proliferation, HPC security has 

' been a secondary concern to performance. It is evident, however, that 

ensuring HPC security presents different challenges than the ones faced 
■ when dealing with traditional networks. To design suitable security mea- 

' sures for high performance computing, it is necessary to first realize the 

lyj , threats faced by such an environment. This task can be accomplished by 

O ' the means of constructing a comprehensive threat model. To our knowl- 

edge, no such threat model exists with regards to Cluster Computing. In 
this paper, we explore the unique challenges of securing HPCs and pro- 
' pose a threat model based on the classical Confidentiality, Integrity and 

V- j ' Availability security principles. 

1 Introduction 



Cluster computing now constitutes over 60% of the top 500 high performance 
computing resources in the world[E], with top-performing clusters such as 
IBM's Blue Gene reaching peak speed of over 180 TFlops across more than 
65,000 nodes JHI- HPCs are used for a variety of research and industry tasks 
many of which are either mission-critical or sensitive by nature, making clus- 
ters an attractive target for industry espionage or sabotage. Additionally, the 
cluster itself, with its highly desirable resources such as powerful computational 
capacities, high-bandwidth network connection and massive storage, which can 
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be employed for a DoS attack, brute force password crackingj^ or illegitimate 
FTP servers, make it an attractive target for attackers. An example of such 
an attack took place in spring of 2004, when attacks on facilities in multiple 
institutions took place 

Attempts to secure the cluster computing environments currently suffer from 
lack of an integrated security approach that takes advantage of the intrinsic 
properties of cluster computing. Applying traditional security measures to in- 
dividual nodes of the cluster is an inadequate measure, as it fails to take into 
consideration the overall context of the system. For example, a node may be 
communicating on a port that appears legitimate to the security system running 
on that node. If the security system were able to interface with the scheduler, it 
would learn that no job is scheduled to be running on that node, therefore there 
should be no communication taking place^^. In order to be able to define a 
comprehensive approach to securing a cluster, we must first strive to completely 
understand the threats and security risks that are present in cluster computing 
environment. 

The best approach to analyzing the threats faced by a computing environe- 
ment is through a systematic threat-analysis approach, or threat modelling. 
Threat modelling involves systematically identifying the assets in the system, 
creating an architectural overview of the system, and identifying the threats at 
each stage of the systemj^. Once the threats are identified, a risk assesment 
analysis is performed to determine whether it is more efficient to mitigate the 
threat or accept the risk of it being exploited. Security mechanisms are then de- 
veloped to mitigate the threats which are determined to be unacceptable. This 
process allows security engineers to effectively identify which security measures 
are necessary. This ensures that the necessary mechanisms are put in place, 
while the unnecessary ones are left unimplementedj^. 

The purpose of this paper is to effectively and comprehensively identify the 
threats faced by cluster computing^ . We chose to use the classical Confidential- 
ity, Integrity and Availability (CIA) security model, a well-accepted and time- 
proven paradigm in the security community, as a basis for our threat model. 
We discuss the unique aspects of a clustered computing network as compared to 
traditional networks, and how the challenges these aspects present within the 
framework of the CIA model. To the best of our knowledge, this is the first 
attempt to use a structured CIA approach to creating a comprehensive threat 
model for clusters. 

The remainder of the paper is organized as follows. In Section |2l we discuss 
the unique aspects of cluster computing as they relate to security assurance. 
In Section El we briefiy survey the previous work on defining a comprehensive 
cluster security model. We then discuss our threat model in section 31 of the 
paper, and offer a summary and some concluding remarks in sectional 

^The risk assessment portion of the security analysis process depends in large part on the 
workload and data sensitivity of each individual cluster, as well as details of its architecture, 
and as such, best left for the cluster administrator to perform. 



2 



2 Security Challenges of Cluster Computing 



As high speed computing continues to shift from mainframe to commodity CPU 
clusters, it is important to note several emerging properties in this new com- 
puting environment, which directly contribute to the difficulty in maintaining a 
secure computing environment. 

2.1 Clusters are highly customizable 

Clusters can be thought of as high speed networks of commodity processors. 
As such, there is not a single definition of what a typical cluster 'profile' is. 
Varied factors among clusters include node quantity, CPU and chipset choice, 
operating system, cluster management software, such as Rocks and OSCAR, 
and interconnect, such as Infiniband, Myrinet and many others[2]- 

2.2 Clusters are often highly heterogeneous 

Frequently, clusters are deployed incrementally, and with diff'erent components. 
As such, clusters can often be found to contain a highly heterogenous mix of 
hardware devices and software configurations. This presents a security challenge 
that is both local and distributed. On one hand, each distinct configuration has 
unique needs in terms of patching and node hardening. On the other hand, lack 
of homogeneity hinders deployment of integrated security solution across the 
whole cluster |lflj . 

2.3 Performance-First mentality 

Clusters are designed to be a high performance computing tool. As such, it 
is a common practice to maximize the performance potential and accessibility 
of clusters, often at expense of security. One such example is the practice of 
exposing all of the cluster's nodes to public networks in order to allow users to 
login directly to computing nodes and run their jobs. The emergence of cluster 
grid computing and on-demand computing has accenuated this problem, since 
these paradigms require all nodes to be accessible to outside connections for 
optimal effectiveness. Another example is disabhng SSH authentication between 
nodes to reduce job start up time. 

2.4 Edge-based security measures model 

As part of the performance-first paradigm described above, security measures 
in clusters, such as secure authentication login are often concentrated along the 
edges of the cluster. As a result, once an attacker gains access to the cluster, 
there may not be significant obstacles to what he can do inside. 
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2.5 Cluster size 



As clusters grow in size, it become increasingly difficult to guarantee secure state 
operation in every one of the components. Increasing the size of the cluster 
also increases the attack space and heightens the possibility that a deteremined 
attacker can locate and exploit a possibility in the system to gain access to one 
of the nodes. Given the security oversight discussed in previous section, finding 
one vulnerable node is often all an attacker needs. 

These aspects clearly differentiate clusters from classical network security 
realms. As a result, cluster security cannot be addressed in the same way 
that tradional network security is addressed. In light of the fact that a cluster 
presents a very different set of challenges to both attackers and security engi- 
neers, and contains vastly different set of assets to potential attackers, cluster 
security must be examined within the framework of a different threat model 
than one faced by traditional networks. 

3 Previous Work 

To the best of our knowledge, no work has been done on developing a compre- 
hensive threat model for cluster computing, to the best of our knowledge. In 
(3, the authors define a threat model for Grid computing at a very high level 
and without focusing on the specific security challenges of the clusters that are 
outlined in this work. In ^j, three specific attacks on clusters are presented. 
In JOIEI, the unique properties of cluster security are addressed, and a Hmited 
threat model, along with a proposal for an integrated cluster security tool is 
given. 

4 Attack Identification 

To create a strong threat model of a system, several questions must be addressed. 
The first step of developing a threat model is to identify the potential assets 
of the system. An asset in threat modeling is defined as an entity or a feature 
of a system that is of interest to the attacker, and as such, gaining access to 
these assets is the motivation behind the attack ^l]. It is the case, however, 
that assets represent entities or features that are desired for legitimate users 
as well, therefore it is not a practical approach to eliminate the motivation for 
attacks by eliminating the assets. Secondly, we must identify potential entry 
points into the system which can be exploited by the attacker in order to gain 
illegitimate access. Entry points can be intentional, such as a pubHc login 
script, or unintentinal, such as an open port or a buffer overflow vulnerability 
in a running library. Finally, given the existing assets and entry points, we 
enumerate what attacks can be launched in order to gain access to each given 
asset. 

It is imperative to follow the above steps when designing a threat model. 
Threat modehng requires a systematic and repeatable approach, which is not 
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achievable by simply brainstorming the question "How can I be attacked?" . 
One must consider not only the assets at risk and the vulnerabilities of the 
system, but also non-technical questions, such as, who is going to be launching 
the attacks (defacers, industrial spies, script kiddies, etc.), and what is the 
motivation behind the attacks (financial gain, access to computing resources, 
etc) 0. 

A few words on our attacker model. We refer to anyone who wishes to cir- 
cumvent the normal operation state of the cluster as an 'attacker'. This is a 
diverse qualification that can include a bored teenager hacking from home, a 
malicious hacker attempting to steal confidential information from the cluster, 
a disgrunted employee, or a legitimate, but dishonest cluster user who is at- 
tempting to get more cluster resources at the expense of other users. Though 
these people may have very different goals in mind, we group them together as 
potential attackers. 

We choose to classify attacks on clusters using the Confidentiality-Integrity- 
Availability threat classification. CIA security model is a seminal classification 
model in information assurance studies and provides a more clear-cut separation 
than newer, alternative models such as STRIDE pTj . 

Assets. As we have briefiy mentioned already, a large-scale cluster is highly 
attractive to attackers both for the data contained in it and the physical re- 
sources that it provides. We identify the following assets in the cluster that an 
attacker might try to get access to. 

• User login data 

• User job data 

• System logs 

• Scheduler 

• Storage systems 

• Intranode network fabric 

• Computing cycles 

• Network packets 

Entry Points. Given the challenges of securing clusters we've discussed in 
sectional there are many entry points which the attacker might utilize to com- 
promise the cluster. These include, 

• Known vulnerabilities in SSH 

• Remote cluster management software 

• Open ports 

• Stolen login information 



5 



• Rogue processes/rootkits 



We now present the CIA threat model for cluster security. To do so, we describe 
how the enumerated entry points can be used in order to gain access to clus- 
ter resources in order to launch attacks against Confidentiality, Integrity and 
Availability of the cluster. 

4.1 CIA Model 

The Confidentiality-Integrity- Availability model is an attractive way to differ- 
entiate attacks. The three properties of the model are key aspects that must be 
guaranteed in a secure computing system. Although there may be some over- 
lap in how an attack can be categorized (for example, a confidentiality attack 
can also be an intergrity attack), we choose to group attacks to a single group 
explicitly 

4.1.1 Confidentiality 

Confidentiality ensures that only the entities autorized to read information and 
access resources can do so. Confidentiality attacks focus on gaining access to 
resources without having the proper authorization to do so0. Gaining illegit- 
imate root access to the system represents the ultimate confidentiality attack, 
however, an attacker may still learn much without the capability to login to the 
cluster. 

• Snooping on External Network. Cluster users are frequently allowed to 
submit jobs over the network. This presents a number of snooping oppor- 
tunities. An attacker may learn when a certain user is running a job (and 
on which node, if the user is allowed to connect to compute nodes directly). 
An attacker may attempt to capture user data being transferred onto the 
cluster, or correlate input/output transmissions to infer information about 
the size of the job the user is running. 

• Snooping on Internal Network. Messages on the internal cluster network 
are often left unencrypted for efficiency reasons. If a malicious insider 
attacker manages to gain access to the communication fabric, or by cap- 
turing a node and putting it in promiscuous mode, an attacker can easily 
intercept data and control packets. 

• Scheduler/Metadata Compromise. If an attacker manages to get adminis- 
trative access to the head node of the cluster, he will be able to examine 
scheduler logs and job metadata information to learn about currently run- 
ning jobs and what jobs have users run in the past. 

• Resource Subversion - Computational. An attacker who has gained access 
to a single cluster node, may, unless the cluster is specifically configured 
to disallow this, bypass the scheduler altogether and launch a job from 
shell. This grants the attacker unauthorized access to the computational 
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resources of the cluster, enabling him to perform illegitimate tasks such 
as hashing values for an offline dictionary attack. 

• Resource Subversion - Storage. Likewise, an attacker who gained access 
to a single cluster node may impHcitly also gain access to the storage 
resources of the cluster. The storage may be used to house and serve 
warez, pornography or other illegal material. This is especially severe 
if the attacker gains enough access to open an unprotected port on the 
cluster through which such connections can be handled. 

4.1.2 Integrity 

Integrity ensures that all modifications done to the data are done by entities 
that are authorized to do so. Integrity attacks violate this condition by enabling 
modification for entities not approved for doing so. Modification is understood 
to mean creating, changing, appending, writing and deleting user and meta 
data. 

• Internal Network Packet Injection. An attacker who gains access to the 
internal network of cluster can use it to send legitimate-looking packets 
with incorrect data to other nodes. For example, the attacker may attempt 
to subvert a computation by sending packets with incorrect data in them. 

• Scheduler Tampering. An attacker who has gained administrative access to 
the scheduler may tamper with it, in order to preempt other jobs running 
on the cluster (this can also be classified as an availability attack), or to 
give his own job higher priority. 

• Log Tampering. Cooperative clusters often allocate a quota of computing 
cycles for each user. By tampering with logs, an attacker can modify other 
people's remaining quotas or his own quota, if the attacker is a legitimate 

user. 

• Data Tampering. With a significantly authorative access, the attacker 
can modify user data on the storage nodes at a whim. In lieu of sufficient 

backup, this can be particularly disastcrous, since data residing on the 
storage nodes is generally the result of hundreds of hours of computations. 

4.1.3 Availability 

Availability ensures that the resources are available to people who are authorized 
to access them when they wish to access them. The goal of an availability attack 
is to make the resource unavailable to the intended users - what is knows as 
a Denial of Service attack. Unlike Confidentiality and Integrity, which have 
been extensively studied and modeled, availability is a more fieeting property 
to define. 
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• Exhausting Log Space. Depending on the cluster configuration, exhausting 
log space may be an effective availability attack if the cluster is configured 
to reject any submitted jobs that it is unable to log. 

• Exhausting Scratch Space. An impropertly configured cluster may allow 
users to store data on the same partition that is used for computation 
scratch space by the cluster. Exhausting this space will cause the cluster 
to have insufficient disk storage to execute a job. 

• Exhausting Storage Space. An attacker may attempt to compeletely fill 
up the existing storage space, making it impossible for legitimate users to 
store the results of their jobs. 

• Scheduling DoS. An attacker may schedule a repetetive, non-expiring job 
(such as a simple program with an infinite loop) to run on all the com- 
puting nodes of the cluster, thus denying legitimate users the ability to 
launch jobs. 

5 Summary and Conclusions 

In this paper, we have presented a threat model for cluster computing. Com- 
puting clusters differ from traditional networks in design and approach, and 
present a unique security challenge compared to them. Additionally, a cluster 
has several properties which make it a highly desirable target for an attack. In 
their current state, many clusters have a shell of varying crunchiness on the 
outside, and a very soft, unprotected inside. We presented a CIA model which 
demonstrates that upon breaking into a cluster through a single node, there is 
very little limit imposed on what the attacker can do while inside, in terms of 
Confidentiality, Integrity and Availability. The inherent conclusion that can be 
drawn is that while securing the individual nodes and preventing break-ins is 
important, much effort need to be put into securing the cluster from the inside 
out, with the emergent properties of cluster computing in mind. 
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